Solution: Partition and Normalize

Let’s learn how to solve the antipattern by partitioning and normalizing.

There are better ways to improve performance if a table gets too large instead of splitting the table manually. These include horizontal partitioning, vertical partitioning, and using dependent tables.

Most database brands can handle many more tables and columns than we would need if we were to use a sensible database design. If we feel that we might exceed the maximum, it’s a clear sign that we need to rethink our design.

Using horizontal partitioning#

We can gain the benefits of splitting a large table without facing any drawbacks by using a feature that is called horizontal partitioning or sharding. We define a logical table with some rules for separating rows into individual partitions, and the database manages the rest. Physically, the table is split, but we can still execute SQL statements against the table as though it were whole.

Horizontal partitioning

We have flexibility in that we can define the way each table splits its rows into separate storage. For example, using the partitioning support in MySQL version 5.1, we can specify partitions as an optional part of a CREATE TABLE statement.

Creating Bugs table using partition

The previous example achieves a partitioning similar to what we saw earlier in this chapter, i.e., separating rows based on the year in the date_reported column. However, it has advantages over splitting the table manually, in that the rows are never placed in the wrong split table, even if the value of the date_reported column is updated, and we can run queries against the Bugs table without the need to reference individual split tables.

The number of separate physical tables used to store rows is fixed at four in this example. When we have rows spanning more than four years, one of the partitions will be used to store more than one year’s worth of data. This will continue as the years go on. We don’t need to add new partitions unless the volume of data becomes so great that we feel the need to split it further.

Partitioning is not defined in the SQL standard, so each database brand implements it in its non-standard way. The terminology, syntax, and specific features of partitioning vary between brands. Nevertheless, some form of partitioning is now supported by every major brand of the database.

Using vertical partitioning#

Whereas horizontal partitioning splits a table by rows, vertical partitioning splits a table by columns. Splitting a table by columns can have advantages when some columns are bulky or seldom needed.

Vertical partitioning

The BLOB and TEXT columns have variable sizes, and they may be huge. For efficiency of both storage and retrieval, many database brands automatically store columns with these data types separately from the other columns of a given row. If we run a query without referencing the BLOB or TEXT columns of a table, we can access the other columns more efficiently. But if we use the column wildcard * in your query, the database retrieves all columns from that table, including any BLOB or TEXT columns.

For example, in the Products table of our bugs database, we might store a copy of the installation file for the respective product. This file is typically a self-extracting archive with an extension such as .exe on Windows or .dmg on a Mac. The files are usually huge, but a BLOB column can store binary data of enormous size.

Logically, the installer file should be an attribute of the Products table. But in most queries against that table, we wouldn’t need the installer. Storing such a large volume of data in the Products table, which we use infrequently, could lead to inadvertent performance problems if we’re in the habit of retrieving all columns using the * wildcard.

The remedy is to store the BLOB column in another table, separate from but dependent on the Products table. We can make its primary key serve as a foreign key to the Products table as well to ensure that there is one row per product row.

Creating ProductInstallers table

The previous example may be an extreme case to make the point, but it shows the benefits of storing some columns in a separate table. For example, in MySQL’s MyISAM storage engine, querying a table is most efficient when the rows are of a fixed size. Since VARCHAR is a variable-length data type, the presence of a single column with the VARCHAR data type in a table prevents the table from gaining that advantage. If we store all variable-length columns in a separate table, then queries against the primary table can benefit (even if only a little bit).

Creating Bugs and BugDescriptions tables

Let’s insert data in both of the tables, Bugs and BugDescriptions, as given below:

Creating Bugs and BugDescriptions tables

Now, as we have inserted the relevant data, let’s try to retrieve this data from the tables in the following playgrounds.

The following code widget shows the data available in the Bugs table. We can also retrieve the data from the BugDescriptions table, and we can also retrieve the results from both tables using JOIN. The example code is SELECT * FROM Bugs INNER JOIN BugDescriptions ON Bugs.bug_id=BugDescriptions.bug_id;.

Retrieve the data by yourself

Fixing metadata tribbles columns#

The solution to fixing metadata tribbles columns is similar to the solution we saw in the chapter Multicolumn Attributes, i.e., the remedy for metadata tribbles columns is to create a dependent table.

Creating ProjectHistory table

Instead of one row per project with multiple columns for each year, we can use multiple rows with one column for fixed bugs. If we define the table this way, we don’t need to add new columns to support subsequent years. Instead, we can store any number of rows per project in this table as time goes on.

Let’s suppose we want to run a query to search many tables at once and that all the tables have the same columns.

We would need the tables with an identical structure, and we would need to have stored them together in a single table with an extra attribute column to distinguish the rows.

Antipattern: Clone Tables or Columns
Untitled Masterpiece
Mark as Completed
Report an Issue